Network Stock Portfolio Optimization¶
Context and Problem Statement¶
Active investing in the asset management industry aims to beat the stock market’s average returns, for which portfolio managers track a particular index and try to beat that index by creating their own portfolios.
Portfolio construction involves selection of stocks that have a higher probability of giving better returns in comparison to the tracking index, like S&P 500. In this project, we will use the concept of Network Analysis to select a basket of stocks and create two portfolios. We will then simulate portfolio value by investing a certain amount, keeping the portfolio for an entire year and we will then compare it against the S&P 500 index.
In this project we will try to follow the approach mentioned in the below research paper:
Proposed Approach¶
- Collect the price data for all S&P 500 components from 2011 till 2020
- Compute log returns for the S&P 500 components for same time period
- Compute the correlation matrix for the above log returns
- Find out the Top n central and peripheral stocks based on the following network topological parameters:
- Degree centrality
- Betweenness centrality
- Distance on degree criterion
- Distance on correlation criterion
- Distance on distance criterion
- Simulate the performance of central and peripheral portfolios against the performance of S&P 500 for the year 2021
Loading the Libraries¶
We will need to first install the library - pandas_datareader using !pip install pandas_datareader
!pip install pandas_datareader
Requirement already satisfied: pandas_datareader in c:\users\max power\anaconda3\lib\site-packages (0.10.0) Requirement already satisfied: lxml in c:\users\max power\anaconda3\lib\site-packages (from pandas_datareader) (5.3.0) Requirement already satisfied: pandas>=0.23 in c:\users\max power\anaconda3\lib\site-packages (from pandas_datareader) (2.2.3) Requirement already satisfied: requests>=2.19.0 in c:\users\max power\anaconda3\lib\site-packages (from pandas_datareader) (2.32.3) Requirement already satisfied: numpy>=1.26.0 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2.1.3) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2025.2) Requirement already satisfied: six>=1.5 in c:\users\max power\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=0.23->pandas_datareader) (1.17.0) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (2025.10.5)
import tqdm
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import networkx as nx
import plotly.express as px
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import os
import warnings
warnings.filterwarnings('ignore')
Getting the S&P 500 Components¶
Beautiful Soup is a library that makes it easy to scrape information from web pages.
import requests
from bs4 import BeautifulSoup
import pandas as pd
#Find tikers of companies in S&P500 from Wikipwdia
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
# Always include a User-Agent to avoid being blocked
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'}
resp = requests.get(url, headers=headers)
# Parse with BeautifulSoup
soup = BeautifulSoup(resp.text, 'lxml')
# Verify which tables are present
tables = soup.find_all('table')
print(f"Found {len(tables)} tables on the page")
# The S&P 500 table is usually the first one
table = tables[0]
# Convert to DataFrame for convenience
df = pd.read_html(str(table))[0]
# Extract tickers
tickers = df['Symbol'].str.replace('.', '-', regex=False).tolist()
print(f"Extracted {len(tickers)} tickers.")
#show first 15 elements
print(tickers[:15])
Found 2 tables on the page Extracted 503 tickers. ['MMM', 'AOS', 'ABT', 'ABBV', 'ACN', 'ADBE', 'AMD', 'AES', 'AFL', 'A', 'APD', 'ABNB', 'AKAM', 'ALB', 'ARE']
Getting the Price Data for all the S&P 500 components in the last 10 years¶
# WGet a new dataset from Wikipedia using Yahoo
# price_data = web.DataReader(tickers, 'yahoo', start='2011-01-01', end='2020-12-31')
# price_data = price_data['Adj Close'] # we will get all the data points and we also get the volume not only the close price, open price
# price_data.to_csv('snp500_price_data_2011_to_2020.csv')
# Build path to file saved in Data fodler
main_dir = os.path.dirname(current_path)
file_path = os.path.join(main_dir, "Data", "snp500_price_data_2011_to_2020.csv")
df = pd.read_csv(file_path, index_col=[0])
print(df.head())
MMM AOS ABT ABBV ABMD ACN ATVI \
Date
2010-12-31 63.855606 8.113162 17.986767 NaN 9.61 39.143620 11.245819
2011-01-03 64.218163 8.125947 17.952976 NaN 9.80 39.224346 11.318138
2011-01-04 64.129395 8.100383 18.121916 NaN 9.80 38.966022 11.327178
2011-01-05 64.129395 8.285738 18.121916 NaN 10.03 38.974094 11.110217
2011-01-06 63.737186 8.289999 18.084377 NaN 10.05 39.119389 11.083097
ADM ADBE ADP ... XEL XLNX XYL \
Date ...
2010-12-31 22.385578 30.780001 31.271172 ... 16.221039 23.216919 NaN
2011-01-03 22.623722 31.290001 31.791464 ... 16.227924 23.569420 NaN
2011-01-04 22.608845 31.510000 31.676586 ... 16.296804 23.665552 NaN
2011-01-05 22.713020 32.220001 32.183357 ... 16.200367 23.745670 NaN
2011-01-06 23.583750 32.270000 32.433369 ... 16.186602 24.146229 NaN
YUM ZBRA ZBH ZION ZTS CEG OGN
Date
2010-12-31 28.547478 37.990002 49.212429 21.089169 NaN NaN NaN
2011-01-03 28.570745 38.200001 50.395058 21.907326 NaN NaN NaN
2011-01-04 28.134232 37.840000 49.725819 21.550467 NaN NaN NaN
2011-01-05 28.268110 37.799999 49.762482 21.672325 NaN NaN NaN
2011-01-06 28.465996 37.480000 48.222305 21.611391 NaN NaN NaN
[5 rows x 505 columns]
df.head()
| MMM | AOS | ABT | ABBV | ABMD | ACN | ATVI | ADM | ADBE | ADP | ... | XEL | XLNX | XYL | YUM | ZBRA | ZBH | ZION | ZTS | CEG | OGN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2010-12-31 | 63.855606 | 8.113162 | 17.986767 | NaN | 9.61 | 39.143620 | 11.245819 | 22.385578 | 30.780001 | 31.271172 | ... | 16.221039 | 23.216919 | NaN | 28.547478 | 37.990002 | 49.212429 | 21.089169 | NaN | NaN | NaN |
| 2011-01-03 | 64.218163 | 8.125947 | 17.952976 | NaN | 9.80 | 39.224346 | 11.318138 | 22.623722 | 31.290001 | 31.791464 | ... | 16.227924 | 23.569420 | NaN | 28.570745 | 38.200001 | 50.395058 | 21.907326 | NaN | NaN | NaN |
| 2011-01-04 | 64.129395 | 8.100383 | 18.121916 | NaN | 9.80 | 38.966022 | 11.327178 | 22.608845 | 31.510000 | 31.676586 | ... | 16.296804 | 23.665552 | NaN | 28.134232 | 37.840000 | 49.725819 | 21.550467 | NaN | NaN | NaN |
| 2011-01-05 | 64.129395 | 8.285738 | 18.121916 | NaN | 10.03 | 38.974094 | 11.110217 | 22.713020 | 32.220001 | 32.183357 | ... | 16.200367 | 23.745670 | NaN | 28.268110 | 37.799999 | 49.762482 | 21.672325 | NaN | NaN | NaN |
| 2011-01-06 | 63.737186 | 8.289999 | 18.084377 | NaN | 10.05 | 39.119389 | 11.083097 | 23.583750 | 32.270000 | 32.433369 | ... | 16.186602 | 24.146229 | NaN | 28.465996 | 37.480000 | 48.222305 | 21.611391 | NaN | NaN | NaN |
5 rows × 505 columns
Missing Data due to Index Rebalancing¶
# Identify stocks with missing data
figure = plt.figure(figsize=(16, 8))
sns.heatmap(df.T.isnull());
The missing data is due to the fact that certain stocks may move out of the S&P 500 at some point, ni that case other stocks may enter the S&P 500 to replacxe the first ones.
Clean Dataset from Null values
price_data_cleaned = df.dropna(axis=1) # dropping null values columnwise
figure = plt.figure(figsize=(16, 8))
sns.heatmap(price_data_cleaned.T.isnull());
The null values are removed - the data is clean and the plot also helps in finding that there are no missing values.
Getting Yearwise Data¶
def get_year_wise_snp_500_data(data, year):
year_wise_data = data.loc['{}-01-01'.format(year):'{}-12-31'.format(year)]
return year_wise_data
# Getting year wise data of S&P stocks from 2011 to 2020 -> divide df into 1 df per year
snp_500_2011 = get_year_wise_snp_500_data(price_data_cleaned, 2011)
snp_500_2012 = get_year_wise_snp_500_data(price_data_cleaned, 2012)
snp_500_2013 = get_year_wise_snp_500_data(price_data_cleaned, 2013)
snp_500_2014 = get_year_wise_snp_500_data(price_data_cleaned, 2014)
snp_500_2015 = get_year_wise_snp_500_data(price_data_cleaned, 2015)
snp_500_2016 = get_year_wise_snp_500_data(price_data_cleaned, 2016)
snp_500_2017 = get_year_wise_snp_500_data(price_data_cleaned, 2017)
snp_500_2018 = get_year_wise_snp_500_data(price_data_cleaned, 2018)
snp_500_2019 = get_year_wise_snp_500_data(price_data_cleaned, 2019)
snp_500_2020 = get_year_wise_snp_500_data(price_data_cleaned, 2020)
snp_500_2011
| MMM | AOS | ABT | ABMD | ACN | ATVI | ADM | ADBE | ADP | AAP | ... | WHR | WMB | WTW | WYNN | XEL | XLNX | YUM | ZBRA | ZBH | ZION | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2011-01-03 | 64.218163 | 8.125947 | 17.952976 | 9.800000 | 39.224346 | 11.318138 | 22.623722 | 31.290001 | 31.791464 | 62.732765 | ... | 67.926483 | 11.205445 | 93.139076 | 77.632828 | 16.227924 | 23.569420 | 28.570745 | 38.200001 | 50.395058 | 21.907326 |
| 2011-01-04 | 64.129395 | 8.100383 | 18.121916 | 9.800000 | 38.966022 | 11.327178 | 22.608845 | 31.510000 | 31.676586 | 59.610516 | ... | 66.950417 | 11.123855 | 91.576157 | 80.054619 | 16.296804 | 23.665552 | 28.134232 | 37.840000 | 49.725819 | 21.550467 |
| 2011-01-05 | 64.129395 | 8.285738 | 18.121916 | 10.030000 | 38.974094 | 11.110217 | 22.713020 | 32.220001 | 32.183357 | 59.687115 | ... | 67.400887 | 11.141987 | 92.847679 | 81.087433 | 16.200367 | 23.745670 | 28.268110 | 37.799999 | 49.762482 | 21.672325 |
| 2011-01-06 | 63.737186 | 8.289999 | 18.084377 | 10.050000 | 39.119389 | 11.083097 | 23.583750 | 32.270000 | 32.433369 | 57.723724 | ... | 65.966843 | 11.119320 | 93.112579 | 81.678650 | 16.186602 | 24.146229 | 28.465996 | 37.480000 | 48.222305 | 21.611391 |
| 2011-01-07 | 63.803787 | 8.409311 | 18.159452 | 9.890000 | 39.183979 | 10.938456 | 23.777237 | 32.040001 | 32.507687 | 59.265705 | ... | 65.689018 | 11.296103 | 92.953644 | 84.570557 | 16.331245 | 24.010040 | 28.821018 | 37.599998 | 48.213131 | 21.385098 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2011-12-23 | 62.401131 | 8.806977 | 21.821796 | 18.469999 | 43.558022 | 11.196037 | 22.084383 | 28.290001 | 37.792553 | 67.537323 | ... | 39.581589 | 15.226687 | 103.337746 | 82.537613 | 19.524267 | 26.499556 | 35.055527 | 36.520000 | 48.781536 | 14.242986 |
| 2011-12-27 | 62.461864 | 8.845891 | 21.903597 | 18.360001 | 43.590965 | 11.168506 | 22.069183 | 28.500000 | 37.869064 | 68.181442 | ... | 36.047947 | 15.277890 | 103.761589 | 85.186325 | 19.825743 | 26.450407 | 35.215870 | 36.590000 | 48.845718 | 14.304041 |
| 2011-12-28 | 61.604034 | 8.612420 | 21.747778 | 18.250000 | 43.533318 | 11.131798 | 21.560009 | 28.020000 | 37.423878 | 67.546921 | ... | 35.854633 | 14.956691 | 102.516556 | 81.952354 | 19.710886 | 26.188274 | 35.025822 | 35.700001 | 48.735703 | 14.033657 |
| 2011-12-29 | 62.332783 | 8.837245 | 21.942547 | 18.379999 | 44.340408 | 11.287809 | 21.841192 | 28.309999 | 37.806461 | 67.633461 | ... | 36.589203 | 15.152204 | 102.966888 | 82.792717 | 19.890350 | 26.409443 | 35.382153 | 35.980000 | 48.992397 | 14.373814 |
| 2011-12-30 | 62.044331 | 8.672951 | 21.903597 | 18.469999 | 43.838036 | 11.306164 | 21.734802 | 28.270000 | 37.569965 | 66.941238 | ... | 36.689724 | 15.370993 | 102.781456 | 82.905281 | 19.840096 | 26.262003 | 35.043640 | 35.779999 | 48.974060 | 14.199376 |
252 rows × 450 columns
snp_500_2011.shift(1)
| MMM | AOS | ABT | ABMD | ACN | ATVI | ADM | ADBE | ADP | AAP | ... | WHR | WMB | WTW | WYNN | XEL | XLNX | YUM | ZBRA | ZBH | ZION | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2011-01-03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2011-01-04 | 64.218163 | 8.125947 | 17.952976 | 9.800000 | 39.224346 | 11.318138 | 22.623722 | 31.290001 | 31.791464 | 62.732765 | ... | 67.926483 | 11.205445 | 93.139076 | 77.632828 | 16.227924 | 23.569420 | 28.570745 | 38.200001 | 50.395058 | 21.907326 |
| 2011-01-05 | 64.129395 | 8.100383 | 18.121916 | 9.800000 | 38.966022 | 11.327178 | 22.608845 | 31.510000 | 31.676586 | 59.610516 | ... | 66.950417 | 11.123855 | 91.576157 | 80.054619 | 16.296804 | 23.665552 | 28.134232 | 37.840000 | 49.725819 | 21.550467 |
| 2011-01-06 | 64.129395 | 8.285738 | 18.121916 | 10.030000 | 38.974094 | 11.110217 | 22.713020 | 32.220001 | 32.183357 | 59.687115 | ... | 67.400887 | 11.141987 | 92.847679 | 81.087433 | 16.200367 | 23.745670 | 28.268110 | 37.799999 | 49.762482 | 21.672325 |
| 2011-01-07 | 63.737186 | 8.289999 | 18.084377 | 10.050000 | 39.119389 | 11.083097 | 23.583750 | 32.270000 | 32.433369 | 57.723724 | ... | 65.966843 | 11.119320 | 93.112579 | 81.678650 | 16.186602 | 24.146229 | 28.465996 | 37.480000 | 48.222305 | 21.611391 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2011-12-23 | 61.467400 | 8.679438 | 21.677664 | 18.520000 | 43.302727 | 10.920726 | 21.810799 | 27.889999 | 37.444744 | 66.700890 | ... | 39.094479 | 15.007899 | 101.907288 | 81.066956 | 19.380699 | 26.450407 | 34.675449 | 36.380001 | 48.625694 | 14.085989 |
| 2011-12-27 | 62.401131 | 8.806977 | 21.821796 | 18.469999 | 43.558022 | 11.196037 | 22.084383 | 28.290001 | 37.792553 | 67.537323 | ... | 39.581589 | 15.226687 | 103.337746 | 82.537613 | 19.524267 | 26.499556 | 35.055527 | 36.520000 | 48.781536 | 14.242986 |
| 2011-12-28 | 62.461864 | 8.845891 | 21.903597 | 18.360001 | 43.590965 | 11.168506 | 22.069183 | 28.500000 | 37.869064 | 68.181442 | ... | 36.047947 | 15.277890 | 103.761589 | 85.186325 | 19.825743 | 26.450407 | 35.215870 | 36.590000 | 48.845718 | 14.304041 |
| 2011-12-29 | 61.604034 | 8.612420 | 21.747778 | 18.250000 | 43.533318 | 11.131798 | 21.560009 | 28.020000 | 37.423878 | 67.546921 | ... | 35.854633 | 14.956691 | 102.516556 | 81.952354 | 19.710886 | 26.188274 | 35.025822 | 35.700001 | 48.735703 | 14.033657 |
| 2011-12-30 | 62.332783 | 8.837245 | 21.942547 | 18.379999 | 44.340408 | 11.287809 | 21.841192 | 28.309999 | 37.806461 | 67.633461 | ... | 36.589203 | 15.152204 | 102.966888 | 82.792717 | 19.890350 | 26.409443 | 35.382153 | 35.980000 | 48.992397 | 14.373814 |
252 rows × 450 columns
Computing the Daily Log Returns¶
Statistically, simple stock returns are always assumed to follow a Log Normal distribution. It is therefore plausible to use properties of the Normal distribution in statistical estimation for Log returns, but not for the simple returns.
Stock Returns analysis is a time series analysis, in which you also take care of stationarity which is normally obtained from Log returns but not from simple returns.
# Calculating daily log returns by subtracting between two days with the help of shift function
log_returns_2011 = np.log(snp_500_2011.shift(1)) - np.log(snp_500_2011)
log_returns_2012 = np.log(snp_500_2012.shift(1)) - np.log(snp_500_2012)
log_returns_2013 = np.log(snp_500_2013.shift(1)) - np.log(snp_500_2013)
log_returns_2014 = np.log(snp_500_2014.shift(1)) - np.log(snp_500_2014)
log_returns_2015 = np.log(snp_500_2015.shift(1)) - np.log(snp_500_2015)
log_returns_2016 = np.log(snp_500_2016.shift(1)) - np.log(snp_500_2016)
log_returns_2017 = np.log(snp_500_2017.shift(1)) - np.log(snp_500_2017)
log_returns_2018 = np.log(snp_500_2018.shift(1)) - np.log(snp_500_2018)
log_returns_2019 = np.log(snp_500_2019.shift(1)) - np.log(snp_500_2019)
log_returns_2020 = np.log(snp_500_2020.shift(1)) - np.log(snp_500_2020)
Computing the Correlation of Returns¶
See which stocks move together (positive correlations are indicated by light colors)
# Computing adjacency matrix:
return_correlation_2011 = log_returns_2011.corr()
return_correlation_2012 = log_returns_2012.corr()
return_correlation_2013 = log_returns_2013.corr()
return_correlation_2014 = log_returns_2014.corr()
return_correlation_2015 = log_returns_2015.corr()
return_correlation_2016 = log_returns_2016.corr()
return_correlation_2017 = log_returns_2017.corr()
return_correlation_2018 = log_returns_2018.corr()
return_correlation_2019 = log_returns_2019.corr()
return_correlation_2020 = log_returns_2020.corr()
figure, axes = plt.subplots(5, 2, figsize=(30, 30))
sns.heatmap(return_correlation_2011, ax=axes[0, 0])
axes[0, 0].set_title("Return Correlation - 2011")
sns.heatmap(return_correlation_2012, ax=axes[0, 1]);
sns.heatmap(return_correlation_2013, ax=axes[1, 0]);
sns.heatmap(return_correlation_2014, ax=axes[1, 1]);
sns.heatmap(return_correlation_2015, ax=axes[2, 0]);
sns.heatmap(return_correlation_2016, ax=axes[2, 1]);
sns.heatmap(return_correlation_2017, ax=axes[3, 0]);
sns.heatmap(return_correlation_2018, ax=axes[3, 1]);
sns.heatmap(return_correlation_2019, ax=axes[4, 0]);
sns.heatmap(return_correlation_2020, ax=axes[4, 1]);
Inferences¶
The first plot for the year 2011 shows that there is high correlation among the stocks. It shows that since in 2011 there was a market crash and there was volatility in the market, the prices of the stock went down along with the other stocks and this is the reason for high correlation.
Similarly in 2012, 2014 and 2017 the market is kind of stable, and hence the correlation among stocks is low.
In 2020, due to the COVID pandemic and the volatility in the market, the prices of the stock went down or up along with other stocks, and this is the reason for high correlation.
From this we can infer that, In stable market conditions, correlation matrices have low correlation values whereas in critical market conditions, correlation matrices have high correlation values.
Creating Graphs¶
graph_2011 = nx.Graph(return_correlation_2011)
figure = plt.figure(figsize=(22, 10))
nx.draw_networkx(graph_2011, with_labels=False)
This is a fully connected network as we created it using the correlation matrix.
A fully connected network means every variable has connections with all the other variables in the network and will also have self-loops.
Filtering Graphs using MST¶
MST - Minimum Spanning Tree
A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.That is, it is a spanning tree whose sum of edge weights is as small as possible.
MST is one of the popular techniques to eliminate the redundancies and noise and meanwhile maintain the significant links in the network.
While removing redundancy and noise in the data using MST, we might lose some information as well.
You can find more on MST here
distance_2011 = np.sqrt(2 * (1 - return_correlation_2011))
distance_2012 = np.sqrt(2 * (1 - return_correlation_2012))
distance_2013 = np.sqrt(2 * (1 - return_correlation_2013))
distance_2014 = np.sqrt(2 * (1 - return_correlation_2014))
distance_2015 = np.sqrt(2 * (1 - return_correlation_2015))
distance_2016 = np.sqrt(2 * (1 - return_correlation_2016))
distance_2017 = np.sqrt(2 * (1 - return_correlation_2017))
distance_2018 = np.sqrt(2 * (1 - return_correlation_2018))
distance_2019 = np.sqrt(2 * (1 - return_correlation_2019))
distance_2020 = np.sqrt(2 * (1 - return_correlation_2020))
Before the construction of the MST graph, the correlation coefficient is converted into a distance.
distance_2011_graph = nx.Graph(distance_2011)
distance_2012_graph = nx.Graph(distance_2012)
distance_2013_graph = nx.Graph(distance_2013)
distance_2014_graph = nx.Graph(distance_2014)
distance_2015_graph = nx.Graph(distance_2015)
distance_2016_graph = nx.Graph(distance_2016)
distance_2017_graph = nx.Graph(distance_2017)
distance_2018_graph = nx.Graph(distance_2018)
distance_2019_graph = nx.Graph(distance_2019)
distance_2020_graph = nx.Graph(distance_2020)
graph_2011_filtered = nx.minimum_spanning_tree(distance_2011_graph)
graph_2012_filtered = nx.minimum_spanning_tree(distance_2012_graph)
graph_2013_filtered = nx.minimum_spanning_tree(distance_2013_graph)
graph_2014_filtered = nx.minimum_spanning_tree(distance_2014_graph)
graph_2015_filtered = nx.minimum_spanning_tree(distance_2015_graph)
graph_2016_filtered = nx.minimum_spanning_tree(distance_2016_graph)
graph_2017_filtered = nx.minimum_spanning_tree(distance_2017_graph)
graph_2018_filtered = nx.minimum_spanning_tree(distance_2018_graph)
graph_2019_filtered = nx.minimum_spanning_tree(distance_2019_graph)
graph_2020_filtered = nx.minimum_spanning_tree(distance_2020_graph)
We choose the MST method to filter out the network graph in each window so as to eliminate the redundancies and noise, and still maintain significant links.
figure, axes = plt.subplots(10, 1, figsize=(24, 120))
nx.draw_networkx(graph_2011_filtered, with_labels=False, ax=axes[0])
nx.draw_networkx(graph_2012_filtered, with_labels=False, ax=axes[1])
nx.draw_networkx(graph_2013_filtered, with_labels=False, ax=axes[2])
nx.draw_networkx(graph_2014_filtered, with_labels=False, ax=axes[3])
nx.draw_networkx(graph_2015_filtered, with_labels=False, ax=axes[4])
nx.draw_networkx(graph_2016_filtered, with_labels=False, ax=axes[5])
nx.draw_networkx(graph_2017_filtered, with_labels=False, ax=axes[6])
nx.draw_networkx(graph_2018_filtered, with_labels=False, ax=axes[7])
nx.draw_networkx(graph_2019_filtered, with_labels=False, ax=axes[8])
nx.draw_networkx(graph_2020_filtered, with_labels=False, ax=axes[9])
On plotting the graphs, we see that the network looks different every year, and no two yearwise graphs look very similar.
Computing Graph Statistics over Time¶
# Are stocks more or less aggregated
average_degree_connectivity = []
average_shortest_path_length = []
year = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
for graph in [graph_2011_filtered, graph_2012_filtered, graph_2013_filtered, graph_2014_filtered, graph_2015_filtered,
graph_2016_filtered, graph_2017_filtered, graph_2018_filtered, graph_2019_filtered, graph_2020_filtered]:
average_shortest_path_length.append(nx.average_shortest_path_length(graph))
figure = plt.figure(figsize=(22, 8))
sns.lineplot(x='year', y='average_shortest_path_length',
data=pd.DataFrame({'year': year, 'average_shortest_path_length': average_shortest_path_length}));
From the above plot we can see that the shortest path length was more stable till 2015 but there was significant increment in 2016 and 2017 and again there was a decrement in 2018. In 2020 there an increment again.
Portfolio Construction¶
log_returns_2011_till_2020 = np.log(price_data_cleaned.shift(1)) - np.log(price_data_cleaned)
return_correlation_2011_till_2020 = log_returns_2011_till_2020.corr()
figure = plt.figure(figsize=(24, 8))
sns.heatmap(return_correlation_2011_till_2020);
distance_2011_till_2020 = np.sqrt(2 * (1 - return_correlation_2011_till_2020))
distance_2011_till_2020_graph = nx.Graph(distance_2011_till_2020)
distance_2011_till_2020_graph_filtered = nx.minimum_spanning_tree(distance_2011_till_2020_graph)
figure = plt.figure(figsize=(24, 8))
nx.draw_kamada_kawai(distance_2011_till_2020_graph_filtered, with_labels=False)
degree_centrality = nx.degree_centrality(distance_2011_till_2020_graph_filtered)
closeness_centrality = nx.closeness_centrality(distance_2011_till_2020_graph_filtered)
betweenness_centrality = nx.betweenness_centrality(distance_2011_till_2020_graph_filtered)
eigenvector_centrality=nx.eigenvector_centrality_numpy(distance_2011_till_2020_graph_filtered)
keys = []
values = []
for key, value in degree_centrality.items():
keys.append(key)
values.append(value)
dc_data = pd.DataFrame({'stocks': keys, 'degree_centrality': values}).sort_values('degree_centrality', ascending=False)
px.bar(data_frame=dc_data, x='stocks', y='degree_centrality', template='plotly_dark')
The bar chart ranks the degree centrality scores of all stocks in the network from 2011 to 2020. The steep initial drop-off shows that only a few stocks maintain strong and widespread correlations, while the majority have relatively weak or specialized connections.
At the top of the ranking, HON (Honeywell International) exhibits the highest degree centrality, meaning it shares significant correlations with the largest number of other stocks in the dataset. This suggests that Honeywell acted as a central connector in the market network during the 2011–2020 period.
This could reflect its diversified business model — spanning industrials, aerospace, and technology — allowing it to be sensitive to and co-move with broader market trends. In other words, HON’s performance was highly interconnected with the general market dynamics, making it a reliable indicator of systemic movements within that period.
Conclusion
Based on the degree centrality analysis:
HON (Honeywell International) emerges as a major hub in the stock correlation network between 2011 and 2020.
Its high number of connections suggests that Honeywell’s price movements were strongly synchronized with those of many other firms, highlighting its influential role in market-wide behavior.
The distribution of centrality scores also points to a core–periphery structure in the market: a few highly connected “core” stocks exert broad influence, while most others remain loosely connected in the periphery.
keys = []
values = []
for key, value in closeness_centrality.items():
keys.append(key)
values.append(value)
cc_data = pd.DataFrame({'stocks': keys, 'closeness_centrality': values}).sort_values('closeness_centrality',
ascending=False)
px.bar(data_frame=cc_data, x='stocks', y='closeness_centrality', template='plotly_dark')
Closeness centrality also involves the shortest path between all possible pairs of stocks on a network.
It is defined as the average number of shortest paths between a stock and all other stocks reachable from it.
keys = []
values = []
for key, value in betweenness_centrality.items():
keys.append(key)
values.append(value)
bc_data = pd.DataFrame({'stocks': keys, 'betweenness_centrality': values}).sort_values('betweenness_centrality',
ascending=False)
px.bar(data_frame=bc_data, x='stocks', y='betweenness_centrality', template='plotly_dark')
Betweenness centrality is the sum of the fraction of all possible shortest paths between any stocks that pass through a stock. It is used to quantify the control of a stock on information flow in the network.
So, the stock with the highest score is considered a significant stock in terms of its role in coordinating the information among stocks.
Preliminary Observations¶
Between 2011 and 2020, network analysis of stock correlations reveals that Honeywell (HON) held the highest degree centrality, meaning it was directly connected to the largest number of other stocks and strongly mirrored overall market movements. In contrast, Pentair (PNR), Emerson Electric (EMR), and Danaher (DHR) exhibited the highest closeness centrality, positioning them at the structural core of the market where they could efficiently influence or reflect broader trends through short, indirect connections. Finally, Procter & Gamble (PG), Colgate-Palmolive (CL), and General Dynamics (GD) ranked highest in betweenness centrality, acting as key bridges that link different market sectors and facilitate the flow of information across the network. From a portfolio optimization perspective, this suggests combining central, diversified industrial leaders like HON, EMR, and DHR for exposure to general market dynamics with connector stocks like PG and CL to capture cross-sector influence, while balancing them with lower-centrality stocks to mitigate systemic risk.
Selecting Stocks based on Network Topological Parameters¶
# we already computed degree centrality above
# we already computed betweenness centrality above
# distance on degree criterion
distance_degree_criteria = {}
node_with_largest_degree_centrality = max(dict(degree_centrality), key=dict(degree_centrality).get)
for node in distance_2011_till_2020_graph_filtered.nodes():
distance_degree_criteria[node] = nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node,
node_with_largest_degree_centrality)
# distance on correlation criterion
distance_correlation_criteria = {}
sum_correlation = {}
for node in distance_2011_till_2020_graph_filtered.nodes():
neighbors = nx.neighbors(distance_2011_till_2020_graph_filtered, node)
sum_correlation[node] = sum(return_correlation_2011_till_2020[node][neighbor] for neighbor in neighbors)
node_with_highest_correlation = max(sum_correlation, key=sum_correlation.get)
for node in distance_2011_till_2020_graph_filtered.nodes():
distance_correlation_criteria[node] = nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node,
node_with_highest_correlation)
# distance on distance criterion
distance_distance_criteria = {}
mean_distance = {}
for node in distance_2011_till_2020_graph_filtered.nodes():
nodes = list(distance_2011_till_2020_graph_filtered.nodes())
nodes.remove(node)
distance_distance = [nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node, ns) for ns in nodes]
mean_distance[node] = np.mean(distance_distance)
node_with_minimum_mean_distance = min(mean_distance, key=mean_distance.get)
for node in distance_2011_till_2020_graph_filtered.nodes():
distance_distance_criteria[node] = nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node,
node_with_minimum_mean_distance)
Distance refers to the smallest length from a node to the central node of the network.
Here, three types of definitions of central node are introduced to reduce the error caused by a single method.
Therefore three types of distances are described here.
1. Distance on degree criterion (Ddegree), the central node is the one that has the largest degree.
2. Distance on correlation criterion (Dcorrelation), the central node is the one with the highest value of the sum of correlation coefficients with its neighbors.
3. Distance on distance criterion (Ddistance), the central node is the one that produces the lowest value for the mean distance.
node_stats = pd.DataFrame.from_dict(dict(degree_centrality), orient='index')
node_stats.columns = ['degree_centrality']
node_stats['betweenness_centrality'] = betweenness_centrality.values()
node_stats['average_centrality'] = 0.5 * (node_stats['degree_centrality'] + node_stats['betweenness_centrality'])
node_stats['distance_degree_criteria'] = distance_degree_criteria.values()
node_stats['distance_correlation_criteria'] = distance_correlation_criteria.values()
node_stats['distance_distance_criteria'] = distance_distance_criteria.values()
node_stats['average_distance'] = (node_stats['distance_degree_criteria'] + node_stats['distance_correlation_criteria'] +
node_stats['distance_distance_criteria']) / 3
node_stats.head()
| degree_centrality | betweenness_centrality | average_centrality | distance_degree_criteria | distance_correlation_criteria | distance_distance_criteria | average_distance | |
|---|---|---|---|---|---|---|---|
| MMM | 0.002227 | 0.000000 | 0.001114 | 2 | 2 | 7 | 3.666667 |
| AOS | 0.004454 | 0.022073 | 0.013264 | 6 | 6 | 5 | 5.666667 |
| ABT | 0.008909 | 0.056584 | 0.032746 | 9 | 9 | 8 | 8.666667 |
| ABMD | 0.002227 | 0.000000 | 0.001114 | 10 | 10 | 9 | 9.666667 |
| ACN | 0.006682 | 0.008899 | 0.007790 | 16 | 16 | 9 | 13.666667 |
We use the parameters defined above to select the portfolios.
The nodes with the largest 10% of degree or betweenness centrality are chosen to be in the central portfolio.
The nodes whose degree equals to 1 or betweenness centrality equals to 0 are chosen to be in the peripheral portfolio.
Similarly, we define the node's ranking in the top 10% of distance as the stocks of the peripheral portfolios, and the bottom 10% as the stocks of the central portfolios.
The central portfolios and peripheral portfolios represent two opposite sides of correlation and agglomeration. Generally speaking, central stocks play a vital role in the market and impose a strong influence on other stocks. On the other hand, the correlations between peripheral stocks are weak and contain much more noise than those of the central stocks.
central_stocks = node_stats.sort_values('average_centrality', ascending=False).head(15)
central_portfolio = [stock for stock in central_stocks.index.values]
peripheral_stocks = node_stats.sort_values('average_distance', ascending=False).head(15)
peripheral_portfolio = [stock for stock in peripheral_stocks.index.values]
central_stocks
| degree_centrality | betweenness_centrality | average_centrality | distance_degree_criteria | distance_correlation_criteria | distance_distance_criteria | average_distance | |
|---|---|---|---|---|---|---|---|
| PRU | 0.011136 | 0.639884 | 0.325510 | 7 | 7 | 0 | 4.666667 |
| AMP | 0.028953 | 0.540198 | 0.284576 | 5 | 5 | 2 | 4.000000 |
| LNC | 0.015590 | 0.526050 | 0.270820 | 6 | 6 | 1 | 4.333333 |
| AME | 0.020045 | 0.517430 | 0.268737 | 4 | 4 | 3 | 3.666667 |
| GL | 0.017817 | 0.452504 | 0.235160 | 8 | 8 | 1 | 5.666667 |
| PH | 0.031180 | 0.389924 | 0.210552 | 2 | 2 | 5 | 3.000000 |
| EMR | 0.006682 | 0.414403 | 0.210542 | 3 | 3 | 4 | 3.333333 |
| TFC | 0.008909 | 0.400791 | 0.204850 | 10 | 10 | 3 | 7.666667 |
| USB | 0.006682 | 0.391624 | 0.199153 | 9 | 9 | 2 | 6.666667 |
| PNC | 0.008909 | 0.353911 | 0.181410 | 11 | 11 | 4 | 8.666667 |
| JPM | 0.011136 | 0.351078 | 0.181107 | 12 | 12 | 5 | 9.666667 |
| PFG | 0.011136 | 0.345023 | 0.178079 | 8 | 8 | 1 | 5.666667 |
| BRK-B | 0.008909 | 0.332216 | 0.170563 | 13 | 13 | 6 | 10.666667 |
| HST | 0.006682 | 0.331202 | 0.168942 | 9 | 9 | 2 | 6.666667 |
| ADP | 0.011136 | 0.312888 | 0.162012 | 14 | 14 | 7 | 11.666667 |
peripheral_stocks
| degree_centrality | betweenness_centrality | average_centrality | distance_degree_criteria | distance_correlation_criteria | distance_distance_criteria | average_distance | |
|---|---|---|---|---|---|---|---|
| CHD | 0.002227 | 0.000000 | 0.001114 | 24 | 24 | 17 | 21.666667 |
| CLX | 0.004454 | 0.004454 | 0.004454 | 23 | 23 | 16 | 20.666667 |
| CAG | 0.002227 | 0.000000 | 0.001114 | 23 | 23 | 16 | 20.666667 |
| CPB | 0.004454 | 0.004454 | 0.004454 | 22 | 22 | 15 | 19.666667 |
| K | 0.002227 | 0.000000 | 0.001114 | 22 | 22 | 15 | 19.666667 |
| HRL | 0.002227 | 0.000000 | 0.001114 | 22 | 22 | 15 | 19.666667 |
| SJM | 0.002227 | 0.000000 | 0.001114 | 22 | 22 | 15 | 19.666667 |
| KMB | 0.004454 | 0.008889 | 0.006672 | 22 | 22 | 15 | 19.666667 |
| ATO | 0.002227 | 0.000000 | 0.001114 | 21 | 21 | 14 | 18.666667 |
| MNST | 0.002227 | 0.000000 | 0.001114 | 21 | 21 | 14 | 18.666667 |
| GIS | 0.011136 | 0.022162 | 0.016649 | 21 | 21 | 14 | 18.666667 |
| WBA | 0.002227 | 0.000000 | 0.001114 | 21 | 21 | 14 | 18.666667 |
| CL | 0.004454 | 0.013303 | 0.008879 | 21 | 21 | 14 | 18.666667 |
| EVRG | 0.002227 | 0.000000 | 0.001114 | 21 | 21 | 14 | 18.666667 |
| ABC | 0.002227 | 0.000000 | 0.001114 | 21 | 21 | 14 | 18.666667 |
Selecting the top 15 stocks for both Central Stocks and Peripheral Stocks¶
color = []
for node in distance_2011_till_2020_graph_filtered:
if node in central_portfolio:
color.append('red')
elif node in peripheral_portfolio:
color.append('green')
else:
color.append('blue')
figure = plt.figure(figsize=(24, 8))
nx.draw_kamada_kawai(distance_2011_till_2020_graph_filtered, with_labels=False, node_color=color)
Here, the red stocks are the central portfolio stocks, and the green ones are the peripheral portfolio stocks.
Performance Evalutation¶
Here we evaluate the performance of the stocks by comparing the performance of the Central Portfolio, Peripheral and S&P 500 Stocks in 2021, and finding out which portfolio performs the best.
# collecting data for all S&P 500 components for the year 2021
# %time price_data_2021 = web.DataReader(tickers, 'yahoo', start='2021-01-01', end='2021-12-31')
#Reading data for 2021 S&P 500 stocks:
#price_data_2021 = pd.read_csv('snp500_price_data_2021.csv', index_col=[0])
# price_data_2021 = price_data_2021['Adj Close']
# price_data_2021.to_csv('snp500_price_data_2021.csv')
# Build path to file saved in Data fodler
main_dir = os.path.dirname(current_path)
file_path = os.path.join(main_dir, "Data", "snp500_price_data_2021.csv")
price_data_2021 = pd.read_csv(file_path, index_col=[0])
print(price_data_2021.head())
MMM AOS ABT ABBV ABMD \
Date
2020-12-31 169.412521 53.700413 107.444366 101.195663 324.200012
2021-01-04 166.582382 52.818787 107.071472 99.552361 316.730011
2021-01-05 166.301315 53.161644 108.396248 100.581787 322.600006
2021-01-06 168.831009 54.993446 108.170555 99.712914 321.609985
2021-01-07 164.498520 55.669361 109.220566 100.780106 323.559998
ACN ATVI ADM ADBE ADP ... \
Date ...
2020-12-31 257.353546 92.402596 49.218819 500.119995 172.915405 ...
2021-01-04 252.673630 89.466812 48.691574 485.339996 165.810379 ...
2021-01-05 254.112091 90.253006 49.638653 485.690002 165.349136 ...
2021-01-06 256.890442 87.575966 51.649975 466.309998 164.770126 ...
2021-01-07 259.314148 89.237915 51.191086 477.739990 165.702438 ...
WYNN XEL XLNX XYL YUM \
Date
2020-12-31 112.830002 64.846153 141.770004 100.827858 106.770256
2021-01-04 106.900002 63.863796 142.429993 98.747719 104.075439
2021-01-05 110.190002 63.241299 144.229996 98.628838 104.085266
2021-01-06 110.849998 64.641907 141.220001 102.789139 104.655716
2021-01-07 109.750000 63.377476 149.710007 107.454628 103.859055
ZBRA ZBH ZION ZTS CEG
Date
2020-12-31 384.329987 153.096832 42.901466 164.329178 NaN
2021-01-04 378.130005 152.172821 42.397789 162.432663 NaN
2021-01-05 380.570007 154.805740 43.069359 163.564590 NaN
2021-01-06 394.820007 159.217133 47.908611 165.967484 NaN
2021-01-07 409.100006 158.273254 49.370266 165.818558 NaN
[5 rows x 505 columns]
price_data_2021.head()
| MMM | AOS | ABT | ABBV | ABMD | ACN | ATVI | ADM | ADBE | ADP | ... | WYNN | XEL | XLNX | XYL | YUM | ZBRA | ZBH | ZION | ZTS | CEG | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2020-12-31 | 169.412521 | 53.700413 | 107.444366 | 101.195663 | 324.200012 | 257.353546 | 92.402596 | 49.218819 | 500.119995 | 172.915405 | ... | 112.830002 | 64.846153 | 141.770004 | 100.827858 | 106.770256 | 384.329987 | 153.096832 | 42.901466 | 164.329178 | NaN |
| 2021-01-04 | 166.582382 | 52.818787 | 107.071472 | 99.552361 | 316.730011 | 252.673630 | 89.466812 | 48.691574 | 485.339996 | 165.810379 | ... | 106.900002 | 63.863796 | 142.429993 | 98.747719 | 104.075439 | 378.130005 | 152.172821 | 42.397789 | 162.432663 | NaN |
| 2021-01-05 | 166.301315 | 53.161644 | 108.396248 | 100.581787 | 322.600006 | 254.112091 | 90.253006 | 49.638653 | 485.690002 | 165.349136 | ... | 110.190002 | 63.241299 | 144.229996 | 98.628838 | 104.085266 | 380.570007 | 154.805740 | 43.069359 | 163.564590 | NaN |
| 2021-01-06 | 168.831009 | 54.993446 | 108.170555 | 99.712914 | 321.609985 | 256.890442 | 87.575966 | 51.649975 | 466.309998 | 164.770126 | ... | 110.849998 | 64.641907 | 141.220001 | 102.789139 | 104.655716 | 394.820007 | 159.217133 | 47.908611 | 165.967484 | NaN |
| 2021-01-07 | 164.498520 | 55.669361 | 109.220566 | 100.780106 | 323.559998 | 259.314148 | 89.237915 | 51.191086 | 477.739990 | 165.702438 | ... | 109.750000 | 63.377476 | 149.710007 | 107.454628 | 103.859055 | 409.100006 | 158.273254 | 49.370266 | 165.818558 | NaN |
5 rows × 505 columns
snp_500_2021 = web.DataReader(['sp500'], 'fred', start='2021-01-01', end='2021-12-31')
price_data_2021.head()
| MMM | AOS | ABT | ABBV | ABMD | ACN | ATVI | ADM | ADBE | ADP | ... | WYNN | XEL | XLNX | XYL | YUM | ZBRA | ZBH | ZION | ZTS | CEG | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2020-12-31 | 169.412521 | 53.700413 | 107.444366 | 101.195663 | 324.200012 | 257.353546 | 92.402596 | 49.218819 | 500.119995 | 172.915405 | ... | 112.830002 | 64.846153 | 141.770004 | 100.827858 | 106.770256 | 384.329987 | 153.096832 | 42.901466 | 164.329178 | NaN |
| 2021-01-04 | 166.582382 | 52.818787 | 107.071472 | 99.552361 | 316.730011 | 252.673630 | 89.466812 | 48.691574 | 485.339996 | 165.810379 | ... | 106.900002 | 63.863796 | 142.429993 | 98.747719 | 104.075439 | 378.130005 | 152.172821 | 42.397789 | 162.432663 | NaN |
| 2021-01-05 | 166.301315 | 53.161644 | 108.396248 | 100.581787 | 322.600006 | 254.112091 | 90.253006 | 49.638653 | 485.690002 | 165.349136 | ... | 110.190002 | 63.241299 | 144.229996 | 98.628838 | 104.085266 | 380.570007 | 154.805740 | 43.069359 | 163.564590 | NaN |
| 2021-01-06 | 168.831009 | 54.993446 | 108.170555 | 99.712914 | 321.609985 | 256.890442 | 87.575966 | 51.649975 | 466.309998 | 164.770126 | ... | 110.849998 | 64.641907 | 141.220001 | 102.789139 | 104.655716 | 394.820007 | 159.217133 | 47.908611 | 165.967484 | NaN |
| 2021-01-07 | 164.498520 | 55.669361 | 109.220566 | 100.780106 | 323.559998 | 259.314148 | 89.237915 | 51.191086 | 477.739990 | 165.702438 | ... | 109.750000 | 63.377476 | 149.710007 | 107.454628 | 103.859055 | 409.100006 | 158.273254 | 49.370266 | 165.818558 | NaN |
5 rows × 505 columns
# Removing NA values:
price_data_2021 = price_data_2021.dropna(axis=1)
snp_500_2021 = snp_500_2021.dropna()
price_data_2021.head()
| MMM | AOS | ABT | ABBV | ABMD | ACN | ATVI | ADM | ADBE | ADP | ... | WTW | WYNN | XEL | XLNX | XYL | YUM | ZBRA | ZBH | ZION | ZTS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2020-12-31 | 169.412521 | 53.700413 | 107.444366 | 101.195663 | 324.200012 | 257.353546 | 92.402596 | 49.218819 | 500.119995 | 172.915405 | ... | 210.679993 | 112.830002 | 64.846153 | 141.770004 | 100.827858 | 106.770256 | 384.329987 | 153.096832 | 42.901466 | 164.329178 |
| 2021-01-04 | 166.582382 | 52.818787 | 107.071472 | 99.552361 | 316.730011 | 252.673630 | 89.466812 | 48.691574 | 485.339996 | 165.810379 | ... | 203.699997 | 106.900002 | 63.863796 | 142.429993 | 98.747719 | 104.075439 | 378.130005 | 152.172821 | 42.397789 | 162.432663 |
| 2021-01-05 | 166.301315 | 53.161644 | 108.396248 | 100.581787 | 322.600006 | 254.112091 | 90.253006 | 49.638653 | 485.690002 | 165.349136 | ... | 202.000000 | 110.190002 | 63.241299 | 144.229996 | 98.628838 | 104.085266 | 380.570007 | 154.805740 | 43.069359 | 163.564590 |
| 2021-01-06 | 168.831009 | 54.993446 | 108.170555 | 99.712914 | 321.609985 | 256.890442 | 87.575966 | 51.649975 | 466.309998 | 164.770126 | ... | 203.699997 | 110.849998 | 64.641907 | 141.220001 | 102.789139 | 104.655716 | 394.820007 | 159.217133 | 47.908611 | 165.967484 |
| 2021-01-07 | 164.498520 | 55.669361 | 109.220566 | 100.780106 | 323.559998 | 259.314148 | 89.237915 | 51.191086 | 477.739990 | 165.702438 | ... | 205.250000 | 109.750000 | 63.377476 | 149.710007 | 107.454628 | 103.859055 | 409.100006 | 158.273254 | 49.370266 | 165.818558 |
5 rows × 503 columns
price_data_2021 = price_data_2021['2021-01-04':]
amount = 100000
central_portfolio_value = pd.DataFrame()
for stock in central_portfolio:
central_portfolio_value[stock] = price_data_2021[stock]
portfolio_unit = central_portfolio_value.sum(axis=1)[0]
share = amount / portfolio_unit
central_portfolio_value = central_portfolio_value.sum(axis=1) * share
peripheral_portfolio_value = pd.DataFrame()
for stock in peripheral_portfolio:
peripheral_portfolio_value[stock] = price_data_2021[stock]
portfolio_unit = peripheral_portfolio_value.sum(axis=1)[0]
share = amount / portfolio_unit
peripheral_portfolio_value = peripheral_portfolio_value.sum(axis=1) * share
snp_500_2021_value = snp_500_2021 * (amount / snp_500_2021.iloc[0])
all_portfolios = snp_500_2021_value
all_portfolios['central_portfolio'] = central_portfolio_value.values
all_portfolios['peripheral_portfolio'] = peripheral_portfolio_value.values
# all_portfolios = pd.concat([snp_500_2021_value, central_portfolio_value, peripheral_portfolio_value], axis=1)
# all_portfolios.columns = ['snp500', 'central_portfolio', 'peripheral_portfolio']
all_portfolios.head()
| sp500 | central_portfolio | peripheral_portfolio | |
|---|---|---|---|
| DATE | |||
| 2021-01-04 | 100000.000000 | 100000.000000 | 100000.000000 |
| 2021-01-05 | 100708.253955 | 100426.138652 | 100062.699137 |
| 2021-01-06 | 101283.288071 | 104249.598589 | 100581.162028 |
| 2021-01-07 | 102787.077946 | 105184.349194 | 100174.816809 |
| 2021-01-08 | 103351.573372 | 105127.033059 | 100190.990005 |
figure, ax = plt.subplots(figsize=(16, 8))
snp_500_line = ax.plot(all_portfolios['sp500'], label='S&P 500')
central_portfolio_line = ax.plot(all_portfolios['central_portfolio'], label= 'Central Portfolio')
peripheral_portfolio_line = ax.plot(all_portfolios['peripheral_portfolio'], label= 'Peripheral Portfolio')
ax.legend(loc='upper left')
plt.show()
As seen from the above plot, it is clear that the Central Portfolio stocks perform better and the Peripheral Portfolio stocks perform poorer in comparison to the S&P 500 stocks in 2021.
Both the portfolios have their own features under different market conditions.
Generally, in stable market conditions Central Portfolio Stocks will perform better whereas Peripheral Portfolio Stocks will perform better in crisis market conditions. This is due to peripheral portfolio stocks are kind of having a weak correlation so they will not be impacted by all other stocks that were present in our network.
We can rebalance our stocks portfolio by using the network analysis.
CONCLUSIONS¶
The results show that the central portfolio — composed of highly connected and influential stocks such as Honeywell (HON), Emerson Electric (EMR), and Danaher (DHR) — significantly outperformed both the peripheral portfolio and the S&P 500 over the 2011–2020 period. This superior performance highlights how stocks occupying structurally central positions in the market network tend to capture systemic growth and benefit from broader economic momentum, effectively acting as “market amplifiers.” In contrast, the peripheral portfolio, while offering some diversification, lagged behind due to its weaker ties to overall market trends.
A good strategy could be to overweighting central stocks within their equity allocation to capitalize on their proven ability to outperform the market, especially during expansionary phases. However, I would still include a select portion of peripheral or low-centrality stocks to provide protection and stability during downturns or sector-specific shocks. This balanced approach would allow clients to benefit from the strong growth potential of central market players while maintaining risk diversification and long-term portfolio resilience.